Application of Neural Networks to the study of stellar model solutions 
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Abstract 

Artificial neural networks (ANN) have different applications in Astronomy, including data reduction and 
data mining. In this work we propose the use ANNs in the identification of stellar model solutions. We 
illustrate this method, by applying an ANN to the 0.8M Q star CG Cyg B. Our ANN was trained using 
60,000 different O.8M stellar models. With this approach we identify the models which reproduce CG 
Cyg B's position in the HR diagram. We observe a correlation between the model's initial metal and 
helium abundance which, in most cases, does not agree with a helium to metal enrichment ratio AY/AZ=2. 
Moreover, we identify a correlation between the model's initial helium/metal abundance and both its age 
and mixing-length parameter. Additionally, every model found has a mixing-length parameter below 1.3. 
This means that CG Cyg B's mixing- length parameter is clearly smaller than the solar one. From this study 
we conclude that ANNs are well suited to deal with the degeneracy of model solutions of solar type stars. 

Keywords: stars: evolution, stars: fundamental parameters, stars: interiors, stars: individual (CG Cyg B) 



1. Introduction 

The determination of stellar masses and 

ages is vi tal fo r different areas of Astronomy 

20071) suc h as planetary formation 

( 



Johnson et all 120071 ). These parameters can be 



derived comparing the position of a star in the 
HR diagram (HRD) agains t mode l predictions 
( Lastennet and Valls-Gabaudl 120021) . This is 
known as an HRD analysis. Frequently, the 
computation of stellar evolutionary models in- 
volves the use parameters for which we do not 
have strong observational constraints. Some of 
these are used to describe mechanisms, such as 
convect ion and diffus ion, which are insufficiently 
known (jCassisl 120051) . As a consequence, we tend 
to have more modelling parameters than observa- 
tional constrains, which results in a degeneracy of 
model solutions. In order to reduce this problem, 
mass and age determinations are currently made 
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using isochrones computed assuming solar scaled 
values for the helium abundance and convection 
parameters. Yet this may lead to wrong results. 

An artificial neural network is composed of a 
collection of artificial neurons organized in several 
different structures, denoted architectures. Each 
neuron receives inputs, processes the inputs and 
delivers a single output. A typical structure has 
three layers: input, intermediate (called hidden 
layer) and output. Its capa city for analysing large 
amounts of data (jBishod . [19950 and its a bility for 
deali ng with multidimensional problems (JHavkinl . 
19991 ) makes them valuable in different areas of As- 
tronomy such as data eduction and data mining 
(JTagliaferri et all [2003). Indeed, ANNs have been 
used to perform an automated classification of stel- 
lar spectra and determine global stellar parame- 
ters (luminosity L, effective temperature T e ff arid 
metal abundance [M/H])from low resolution spec- 
tra ([Bailer- Jones! l2000l ). 

In this work we propose a new application of 
ANNs, the identification of stellar modelling pa- 
rameters age, helium abundance (Y), mctalicity 
(Z) and the mixing-length parameter a defined by 
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Bohm-Vitensd (119581 ) of solar type stars, by taking 
into account their position in the HR diagram. This 
allows to analyse the degeneracy of stellar model 
solutions and evaluate its impact on the determi- 
nation of stellar ages. We conclude by presenting 
an application to the star CG Cyg B. 

2. The Artificial Neural Network 

2.1. Overview and Motivation 

The challenge in this work is to perform a 
multidimensional curve fitting for an astrophysi- 
cal relationship between four variables, given a set 
of temperatu r e and luminosity for a fixed mass 
(|Cunhaet all 120031) . 

Since the relation between the four variables is 
not known a priori, the multidimensional curve fit- 
ting is done using the inverse relation (four vari- 
ables: Y, Z, a and age) to two variables (L & T e ff). 
Then a complete mapping is done by simulating 
the trained Neural network (NN) against all pos- 
sible values of the four variables required for our 
case study. Yet why using a NN to tackle this re- 
gression problem? First of all, albeit having a large 
amount of input /output data is available we are not 
sure how to relate it. Despite this problem appears 
to have overwhelming complexity, there is clearly 
a solution. Indeed, it is relatively easy to create a 
number of examples of the correct behaviour. 
The objective of using a neural network (N N) model 
is to e mulate a biological neur a l network (IHavkin , 



19991; iBishopl . ll995UMitcheil Il997t iDuda et al 
200lh . 



NN are composed of a collection of artificial 
neurons organized in several different structures, 
denoted architectures. Each neuron receives 
inputs, processes the inputs and delivers a single 
output. A typical structure has three layers: input, 
intermediate (called hidden layer) and output. 
There are several flavours of artificial neural 
networks: fee d-forwar d , recu rsive, radial basis and 
many more ( Havkinl Il999f) . Back-propagation 



is a simple and successful training procedure to 
feed-forward NN a nd very useful for curve fitting 
(|Duda et all 1 2 lh . It is a supervised learning 



algorithm that uses the delta rule to minimize 
the error between the pattern and the classifi- 
cation at the output layer by back-propagating 
this error layer by layer actualizing the fitting 
parameters, the node weights. We have chosen 
this paradigm and in particular back-propagation 



NN because of its ability to deal with complex , 
non-linear and parallel computation (jHavkinl . 
1999). An interesting reference to supporting tools 
for developing appli cations of NN is shown by 



Demuth et al.l ([20081 ) . Neural networks display 
remarkable capabilities to derive meaning from 
complicated or imprecise data and can be used to 
extract patterns and detect trends that are too 
complex to be notice d by either humans or othe r 
computer techniques ([Stergiou and Sieanosl ll996T ). 
A trained neural network is similar to an "expert" 
analysing known information. Another interesting 
characteristic of NN is its capability for handling 
multidimensionality, which is related wit h three 
facto r s, dimensions, disc overy and time (|Bishop| . 
1995c IDuda et all 120011) . The main advantages 
of NN over other technolog ies are based in their 
follow ing characteristics (jStereiou and Sieanosl 



1996): 



1. Adaptive learning: An ability to learn how to 
do tasks based on the data given for training or 
initial experience. 

2. Self-Organisation: An NN can create its own 
organisation or representation of the information 
it receives during learning time. 

3. Real Time Operation: NN computations may 
be carried out in parallel, and special hardware 
devices are being designed and manufactured 
which take advantage of this capability. 

4. Fault Tolerance via Redundant Information 
Coding: Partial destruction of a network leads 
to the corresponding degradation of performance. 
However, some network capabilities may be re- 
tained even with major network damage. 

For all the characteristics and capabilities of NN 
just described, it seems rather appropriate and in- 
teresting for addressing astronomy problems such 
as the one discussed here. Furthermore, NN are 
being recognized as a useful an d flexible techno 



logy within Astrophysics do main (jTagliaferri et al 



2003: lAndreon et al.l . 12000). In summary, the mo- 
tivation for using a Neural Network (NN) is its mo- 
delling ability and versatility to handle classifica- 
tion problems, in large data sets with uncertain in- 
formation, which seems rather suitable for studying 
degeneracys in stars. 

Considering the need to demonstrate the capa- 
bilities of our devised NN approach for constraining 
stellar modelling parameters and detecting degene- 
racies of stellar model solutions, we used a rela- 
tively contained case study as proof-of— concept for 



our model. The c ase study is based on data prev i- 
ously computed bv lFernandes and Monteirol (|2003l) , 
which comprises a regression of 60,000 M=0.8M Q 
stellar models, with six variables: 4 outputs (a, Y, 
Z & age) and 2 inputs (T e // & L). The stellar evolu- 
tionary models required for this analysis were com- 
puted using t he CESAM stellar evolutionary code 
(jMorei 119971 ). In these computations we used the 
same physical ingredients adopted bv ICunha et al. 
(2003) in the modelling of HR 1217. 



Table 1: Variable range and their precision 



Variable 



Y 



Z 



a age(MYr) 



2.2. The Neural Network model 

The NN model is based on the knowledge that 
Stellar masses and ages can be derived compar- 
ing the position of a star in the HR Diagram 
against the predictions of ev olutionary models 
( Fernandes and Monteiro L 20031). This is known as 
an HRD analysis ( Fernan des and Monteirol 12003 ; 
Demuth et all 120081 iTaeliaferri et alll2003l ). In or- 
der to describe the stellar interiors we frequently 
use several parameters, such as the mixing-length 
parameter, whose values are uncertain. This leaves 
us with an open problem. Consequently, we find se- 
veral combinations of modelling parameters which 
are able to reproduce the effective temperature and 
luminosity of a given star, which does not allow to 
accurately derive its mass and age. As mentioned 
before we limited our computations to 0.8M Q stel- 
lar models. This study can be particularly useful 
for choosing the best Hertzprung-Russel Diagram 
(HRD) regions and infer stellar parameters from the 
knowledge of luminosity and effective temperature. 
It is well known that this inverse problem has no 
unique solution ([Fernandes and Monteirol . 120031 ). 

In the classical literature on Neural Networks 
(NN) there are many types of architectures and 
reasoning sch emes that can used to model the pro- 
blem at hand (JHavkinl . [l999t iBishod. 119951 iMitchell . 
19971 ). However, from the set of possible architec- 
tures for artificial neural networks (Feed-Forward, 
Feed-Back, Network Layers, Perceptrons), we chose 
the Feed-Forward, where information is constantly 
fed forward from one layer to the next, without 
loops, associating inputs with outputs. The rea- 
son for the choice of using a back-propagation algo- 
rithm is due to its capability of working with large 
amounts of data and also its efficien t tuning ability 
(lHavkinl[l999llAndreon et al.l , [200b1:lDemuth et all 



20081 ). Considering that Back-Propagation Neu- 
ral Networks learn by example, we used a set of 
60% of total input elements, 20% for checking the 
data set, and 20% for validation. To train the 
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NN we used the Lev enberg-Marquardt algorithm 
20081) . which is a faster algorithm 



(Demuth ct al 



than the delta rule, to optimize the minimization 
of the error. As transfer functions for the neurons, 
we used a sigmoid function in the first hidden layer 
and a linear function on the second layer (output). 
The (Y,Z,a,age)=f(T e ff,L) regression func- 
tion, is the result of the trained neural network, 
feed-forward with back-propagation for the er- 
ror propagation. Since the objective is to iden- 
tify/analyse stellar model degeneracies, we address 
the problem by finding the inverse relation, for a 
given temperature and luminosity, given a fixed 
mass such as: (T e ff,L) = R(Y, Z , a, Age) , where 
R is one to many relation, because of the problem 
degeneracy. This relation is defined by using f to 
interpolate the values of a grid for all possible va- 
lues of Y, Z, a and age, with a desired precision. 
This inverse relation is the main novelty in our NN 
approach. 

2.3. Reasoning scheme 

Considering that our case study are 0.8M Q po- 
pulation I stars, our demonstrator took into ac- 
count the range of input variables given at Ta- 
ble 1. This range of parameters is well suited to 
stud y population I stars l ocated within the galactic 
disk (jCunha et all l2003t iFernandes and Monteirol 
2003). Moreover, we set the precision for each vari- 
able according to the binsize shown at Table 1 . This 
gives us about 10 7 grid points representing the pos- 
sible universe of stars. By simulation of the Neural 
Network we get their respective effective tempera- 
ture and luminosity, thus completing the interpola- 
tion. Figures such as Fig. [TJ which in this case has a 
1000 element bin in effective temperature and lumi- 
nosity, helps us to identify the regions of the HRD 
where we should expect the highest degeneracy of 
model solutions. 

The results obtained by training our network are: 
mean squares error of 7.03 xl0~ 7 which indicates a 
very low error between input and output values, 



and a regression value of 0.9997, very close to 1, 
representing a close correlation between input and 
output values. Using a 1000 element bin in effective 
temperature and luminosity (shown at Fig. [T]) we 
notice that, for all bins, the standard deviation for 
the mixing-length parameter ranges from 0.05 to 
0.25. This allows to identify, for instance, 0.8M Q 
stars with a mixing length smaller than the solar 
value. On its own turn, the standard deviation for 
the initial metallicity is less than 0.007. Likewise, 
the standard deviation for the initial helium abun- 
dance ranges between 0.005 and 0.03. Assuming 
that the error on the parameter determination is 
three times this value, we can estimate Y with an 
uncertainty between 0.015 and 0.09. In the range 
of parameters evaluated here (0.23<Y<0.30), this 
corresponds to a relative error between 5 and 39% 
(with an average relative error around 20%). This 
accuracy in helium determination is competitive in 
relation to methods bas ed on grid interpolations 
(jCasagrande et all 120071) . 

This concludes the reasoning scheme description 
that will be implemented and validated for any data 
set given by different users. At this stage, the in- 
put is a txt file with the input variables; the NN 
is trained in Matlab, using the model described; a 
txt file is the result of the training process (out- 
put). This output txt file is then imported to 
the DB (MySQL) included in the demonstrative 
web-interface already built. After querying the DB 
with SQL syntax, text files can be saved and used 
to plot graphics, such as shown in Fig. [TJ After 
this stage we are sure the network is well trained 
and the whole decision process works. 

3. Application to CG Cyg B 

In order to illustrate our ANN's capability for the 
identification of stellar model solutions we require 
a star with a mass similar to the one of the models 
used to train our tool. The eclipsing binary CG Cyg 
is loc ated well within the solar vicinity ([Popper . 
1998). This constrains its component's age, initial 
helium and metal abundances, well within the range 
of modelling parameters used to train our ANN. 
Moreover, the fact that CG Cyg's l ower ma s s com - 
ponent is a O.81O±O.O13M star (jPopperl . J1994J) . 
makes it a suitable target for our ANN. 

Light curve a nalysis and I c -band me asure - 
ments allowed iHillenbrand and White! ( 2004 ) 
to estimate CG Cyg B's effective tempera- 
ture (log(T e// )=3.674±0.006) and luminosity 
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Figure 1: Degeneracy, i.e. number of model solutions, for 
0.8 Mq stars for a bin in Teff and L of 1000. The red areas 
correspond to regions of the HRD where we can find the 
highest degeneracy of 0.8 Mq stellar model solutions. 



(log(L/L )= -0.510±0.030). Taking into account 
these global stellar properties, our ANN identified 
75412 sets of modelling parameters which repro- 
duce, within the given uncertainties, CG Cyg B's 
position in the HRD. These are shown in FigJ^k- 
Note that no model reproduces CG Cyg B's exact 
position in the HRD. Yet, this is already expected 
since this star is lightly more massive than 0.8M Q . 
The full range of possible modelling parameters 
is shown at Table[2j This shows that CG Cyg B ad- 
mits almost all possible values for the initial helium 
and metal abundances (Figs. 2b and c). However, 
as seen in FigJ^f, these parameters are strongly cor- 
related, presenting a linear correlation coefficient 
r=0.909 (corresponding to a false alarm probabi- 
lity smaller than 0.1%). Indeed, a linear fit to the 
model solutions gives: 



Z= -0.0275 + 0.1736Y, 



(1) 



with a r.m.s.«0.0016. Figure [2f shows that, in 
most cases, the model's initial helium and metal 
abundances are not in agreement with what is ex- 
pected from a helium to metal enrichment ratio 
AY/AZ=2, which takes i nto account the solar he - 
lium to metal proportion (jCasagrande et all 12007 ). 
Knowledge on the star's metallicity would be valu- 
able to test this enrichment ratio. Figure Oi shows 
that there is some degree of correlation between the 
model's age and its chemical composition. The best 
fit to the data corresponds to: 
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Figure 2: a) Number of model solutions for CG Cyg B (100 bins in log(T e yy) and log(L/L,0)). b) Average initial helium 
abundance for the same bin size, c) Average metallicity for the same bin size, d) Average mixing length for the same bins. 
e) Average age for the same bin size. The crosses corresponds to CG Cyg B's exact position in the HRD. f) Number of 
model solutions for CG Cyg B (0.07 bin size in Y and 0.022 in Z). The solid grey line corresponds to the best linear fit to the 
model's initial helium and metal abundances. The dashed line corresponds to a helium to metal enrichment ratio AY/AZ=2. 
g) Average mixing-length parameter for different initial helium and metal abundances for the same bins, h) Average age for 
different initial helium and metal abundances of the same bin size. 



Table 2: Range of modelling parameters for CG Cyg B. 

Y Z a age (MYr) 

min max min max min max min max 

0.23 0.30 0.01196 0.03 1.0 1.3 1850.754 10000. 



age = 3.46 x 10 4 - 1.527 x 10 5 F 
+7.653 x 10 5 Z (MYr), 



(2) 



with a r.m.s.«902. In average, the model solutions 
seem to be older than the Sun (FigJ2^). This is 
reinforced by the fact, in the bin size that we have 
selected, the age standard deviation of the best mo- 
dels (i.e. those that better reproduce CG Cyg's 
position in the HRD) is lOOOMYr. Likewise, a is 
correlated with the model's chemical composition 
(Figure^), with the best fit: 



a = 1.537 - 3.462F + 22.927Z, 



(3) 



with a r.m.s.«0.055. Notice that every possible 
model has a mixing length smaller than 1.3 (Ta- 
ble [H and Figs. [5Ji and g). This means the 
CG Cy g B's a is clearly smaller t h an th e solar 



value. lLastennet and Valls-Gabaudl (|2002l ) found 
no isochrone that could fit CG Cyg B's position in 
the HR diagram, claiming that this star was far too 
cold. Yet, they assumed a solar mixing- length para- 
meter. A lower mixing- length parameter (like the 
ones reported here) shifts the isochrones towards 
smaller effective temperatures. 

Note that CG Cyg B is slightly more mas- 
sive than the models used here. Thus it is im- 
portant to assess the impact that this mass dif- 
ference can have on the parameter estimation. 
Studies of main sequence stars s uch as the com- 



pone nts of the UV Psc binary (jLastennet et al. 
20031 1 or sub gi ants like Hyd and evolu- 



tionary models flF crnand es and Monteirol . 12003 ; 
Pinheiro and Fernandesl . |2010|) can provide us use- 
ful hints. In comparison with O.8OM models, 
higher mass evolutionary tracks are shifted to- 
wards higher effective temperatures and lumi- 
nosities. Moreover O.81M stars evolve faster. 
Therefore, CG Cyg B's age should be slightly 
smaller than our m odels' prediction s. Th e mo- 
dels computed by lLastennet et al.1 (J2003I) and 
Pinheiro and Fernandesl (|2010n can be used to 



estimate the mixing-length's mass dependence. 



Roughly speaking, the mixing-length's mass depen- 
dence Aa/AM should be around -4, i.e. our mo- 
dels overestimate CG Cyg B's mixing length pa- 
rameter by a factor close to 0.04. Thus reinforcing 
furthermore our conclusions regarding this parame- 
ter. The same models can be used to derive a sim- 
ilar AY/ AM ratio. For instance, the overlap be- 
tween Pinheiro & Fernandes's evolutionary tracks 
of 0.90M© Y=0.29 and O.92M Y=0.25 stars in- 
dicates that our models overestimate CG Cyg B's 
helium abundance by a factor 0.005. Lastennet et 
al.'s models hint a similar result. Finally, the same 
reasoning applied to the metalicity gives a AZ/AM 
ratio around -0.1, i.e. an 0.001 overestimation of 
CG Cyg B's metalicity. This is close to what we 
obtain if we applied our helium overestimation to 
equation 1. 

4. Conclusions 

The strong correlation observed between the ini- 
tial helium and metal abundances of CG Cyg 
B's 0.80M Q model solutions is similar to the one 
that can be seen in the analysis of UV Psc by 
Lastennet et al.l (|2003l ). Yet, no numerical relation- 
ship is explicitly given there. Moreover, we observe 
a correlation between the model's chemical compo- 
sition (Y & Z) and both their age and mixing-length 
parameter. Finally notice that for all possible mo- 
dels, the later parameter (a) is smaller than the so- 
lar value. This could explain why for binary stars 
like CG Cyg, isochrones fail to fit, at the same time, 
both component's position in the HRD. 

Generally, we can constrain furthermore the 
modelling parameters of a given star by taking 
into account direct [Fe/X] observations and/or 
astreoseismic data. In the particular case of 
CG Cyg, we can simultaneously analyse both 
components (which should have the same age, 
initial helium and metal abundance). That leaves 
us with 5 unknown parameters (Y, Z, age, cxai a s) 
against 6 known global properties: Ma, T e ffA, 
Ija, Mb, T e ffB, L^. Yet, this is not the scope 
of the present work. Indeed this work aims to 
show the adequacy of ANNs for the identification 
of modelling parameters which reproduce known 
global stellar properties and, consequently, their 
usefulness for analysing the degeneracy of stellar 
model solutions. 

In this work we applied an ANN in order to per- 
form a regression between the 4 parameters that 



we are searching and the two observables. The 
way our ANN is implemented ensures the reliabil- 
ity of this regression. Also notice that by treating 
the modelling parameters as free, we do not have 
the risk of suffering the consequences of assuming 
the wrong values. Additionally, once the ANN has 
been trained, our tool is able to identify the model 
solutions faster than a pproaches such as P Swarm 
([Fernandes et all 1201 11 ) which, at each iteration, re- 
quire the computation of an additional stellar mo- 
dels. This is particularly important in situations 
where a large number of stars have to be analysed. 
Likewise, our method's accuracy in helium determi- 
nation is competitive i n relation to methods b ased 
on grid interpolations (jCasagrande et all 120071 ) On 
the other hand, the tool presented here is useful 
for the study of the degeneracy of stellar model so- 
lutions. Generally, only a small amount of solu- 
tions are evaluated (e.g. iFernandes and Monteirol 
2003) , while here we have access to a large range of 
model solutions, allowing to analyse the degeneracy 
of model solutions problem as a whole. 

Having shown the suitability of our method, we 
plan to apply our ANN to a larger sample of stars. 
This includes not only individual stars (for which 
their metal abundance may be known or not), bi- 
nary stars and stellar populations, in particularly 
in clusters. This will allow studies on the chemical 
evolution of galaxies. Yet, in order to do so our 
Neural Network has to be trained using other stel- 
lar masses. The choice of the mass intervals needs 
to take into account the impact that this choice 
will have on the determination of stellar parame- 
ters. That will be slightly different for different re- 
gions of the HRD. In any case, a reasoning scheme 
similar to the one done here for CG Cyg B will be 
valuable. 

In the future we plan to implement models with 
different masses in the web-based tool, i.e. using 
the mass as an additional output parameter The 
final result will be a tool that accepts txt files as 
inputs, trains and validates the ANN and produces 
outputs (both in text format and graphical form). 

As a final remark we should remind that, like all 
works relying on the use of stellar evolutionary mo- 
dels, our ANN is limited by the set of models used 
to train it. This stresses the importance of using 
the right physical ingredients and assumptions. 
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